Inline deep #3316

nicolasstucki · 2017-10-12T14:53:49Z

Methods that cannot be inlined due to depth (compiling dotty 30 times in a loop):

With default XX:MaxInlineLevel=9 list of 20362 methods
With XX:MaxInlineLevel=35 list of 3015 methods, only recursive methods remaining.

Collected with -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining

Benchmarks with XX:MaxInlineLevel=9

[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 33786.645 ms/op
[info] # Warmup Iteration   2: 17443.817 ms/op
[info] # Warmup Iteration   3: 15673.122 ms/op
[info] # Warmup Iteration   4: 14657.183 ms/op
[info] # Warmup Iteration   5: 14269.043 ms/op
[info] # Warmup Iteration   6: 14695.653 ms/op
[info] # Warmup Iteration   7: 14047.060 ms/op
[info] # Warmup Iteration   8: 13985.261 ms/op
[info] # Warmup Iteration   9: 13947.841 ms/op
[info] # Warmup Iteration  10: 14382.364 ms/op
[info] Iteration   1: 13818.103 ms/op
[info] Iteration   2: 13930.734 ms/op
[info] Iteration   3: 13996.306 ms/op
[info] Iteration   4: 14177.422 ms/op
[info] Iteration   5: 13915.746 ms/op
[info] Iteration   6: 13768.716 ms/op
[info] Iteration   7: 13849.529 ms/op
[info] Iteration   8: 14156.749 ms/op
[info] Iteration   9: 13592.561 ms/op
[info] Iteration  10: 13648.460 ms/op
[info] Iteration  11: 13645.160 ms/op
[info] Iteration  12: 13986.800 ms/op
[info] Iteration  13: 13579.973 ms/op
[info] Iteration  14: 13595.698 ms/op
[info] Iteration  15: 13753.457 ms/op
[info] Iteration  16: 14118.314 ms/op
[info] Iteration  17: 13578.536 ms/op
[info] Iteration  18: 13624.389 ms/op
[info] Iteration  19: 13540.987 ms/op
[info] Iteration  20: 14298.178 ms/op
[info] 
[info] 
[info] Result "dotty.tools.benchmarks.Worker.compile":
[info]   13828.791 ±(99.9%) 203.119 ms/op [Average]
[info]   (min, avg, max) = (13540.987, 13828.791, 14298.178), stdev = 233.912
[info]   CI (99.9%): [13625.672, 14031.910] (assumes normal distribution)

Benchmarks with XX:MaxInlineLevel=35

[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 30287.934 ms/op
[info] # Warmup Iteration   2: 17105.325 ms/op
[info] # Warmup Iteration   3: 14658.572 ms/op
[info] # Warmup Iteration   4: 14571.616 ms/op
[info] # Warmup Iteration   5: 14009.009 ms/op
[info] # Warmup Iteration   6: 14326.850 ms/op
[info] # Warmup Iteration   7: 13719.054 ms/op
[info] # Warmup Iteration   8: 13633.771 ms/op
[info] # Warmup Iteration   9: 13579.682 ms/op
[info] # Warmup Iteration  10: 14067.770 ms/op
[info] Iteration   1: 13388.887 ms/op
[info] Iteration   2: 13424.087 ms/op
[info] Iteration   3: 13648.308 ms/op
[info] Iteration   4: 13995.807 ms/op
[info] Iteration   5: 13399.279 ms/op
[info] Iteration   6: 13477.962 ms/op
[info] Iteration   7: 13418.966 ms/op
[info] Iteration   8: 13907.012 ms/op
[info] Iteration   9: 13229.703 ms/op
[info] Iteration  10: 13326.266 ms/op
[info] Iteration  11: 13274.452 ms/op
[info] Iteration  12: 13684.152 ms/op
[info] Iteration  13: 13353.853 ms/op
[info] Iteration  14: 13381.231 ms/op
[info] Iteration  15: 13240.378 ms/op
[info] Iteration  16: 13720.528 ms/op
[info] Iteration  17: 13263.029 ms/op
[info] Iteration  18: 13207.083 ms/op
[info] Iteration  19: 13245.706 ms/op
[info] Iteration  20: 13590.566 ms/op
[info] 
[info] 
[info] Result "dotty.tools.benchmarks.Worker.compile":
[info]   13458.863 ±(99.9%) 198.145 ms/op [Average]
[info]   (min, avg, max) = (13207.083, 13458.863, 13995.807), stdev = 228.184
[info]   CI (99.9%): [13260.718, 13657.008] (assumes normal distribution)

Changing XX:MaxRecursiveInlineLevel from 1 (default) to two made the compiler a bit slower.

nicolasstucki · 2017-10-12T14:55:08Z

We can currently not test the performance of this change on the benchmark server. @liufengyun will work on some changes that will make this possible.

nicolasstucki · 2017-10-12T15:00:17Z

Other interesting inlining data for methods too big to inline https://gist.github.com/nicolasstucki/e239b7d331aa152c75259b4273f75201. Unfortunately nothing close to 35 bytes.

smarter · 2017-10-12T15:09:57Z

I would try MaxInlineLevel=18 too.

smarter · 2017-10-12T15:11:12Z

bench/src/main/scala/Benchmarks.scala

@@ -37,7 +37,7 @@ object Bench {
    val libs = System.getProperty("BENCH_CLASS_PATH")

    val opts = new OptionsBuilder()
-               .jvmArgsPrepend("-Xbootclasspath/a:" + libs + ":", "-Xms2G", "-Xmx2G")
+               .jvmArgsPrepend("-Xbootclasspath/a:" + libs + ":", "-Xms2G", "-Xmx2G", "-XX:MaxInlineLevel=35")


This is fine for testing but we shouldn't merge this since it doesn't reflect what 99% of people using dotty through sbt will get.

I there a way to add this to sbt?

nicolasstucki · 2017-10-12T15:20:05Z

MaxInlineLevel=32 was not enough.

retronym · 2017-10-13T04:08:43Z

Unfortunately nothing close to 35 bytes.

I assume you're aware, but for the benefit of others reading this, it is worth noting that that 35 byte threshold (-XX:MaxInlineSize=35) is used in HotSpot to (almost) unconditionally inline a method into its caller. Howver, larger methods are still inlined according to profile data: methods smaller than -XX:FreqInlineSize=325 are inlined if called frequently enough.

From your examples:

dotty.tools.dotc.core.Types$$anon$5::apply (344 bytes)   hot method too big

...would have been eligible for inlining if it were smaller than 325 bytecodes. If the message is just "too big", it means it wasn't a hot call site, and the callee would need to be <= 35 bytes to be inlined.

-XX:MaxInlineLevel=9 controls the depth of stacks of inlining (our trait encoding runs into this more than we would like!).

nicolasstucki · 2017-10-13T12:16:01Z

@retronym, I aware of the other flags but I have not tried them all yet. I tried XX:MaxInlineLevel because our tait also reaches this limit. Changing XX:MaxInlineSize had negative effects. I will try to change XX:FreqInlineSize to see what happens, thanks for the tip :)

nicolasstucki requested a review from smarter October 12, 2017 15:09

smarter reviewed Oct 12, 2017

View reviewed changes

nicolasstucki added the stat:do not merge label Oct 23, 2017

nicolasstucki added 4 commits October 24, 2017 11:56

Make MaxInlineLevel larger to unblock JVM inlining

bf7249f

Make natSize tailrec to avoid "recursive inlining is too deep"

8c42d75

Allow deeper inlines in sbt

132f780

Only cachange the inline limit for the build project

552dbcf

nicolasstucki force-pushed the inline-deep branch from 8a34c71 to 552dbcf Compare October 24, 2017 09:59

nicolasstucki added stat:needs review and removed stat:do not merge labels Oct 24, 2017

nicolasstucki requested a review from allanrenucci October 24, 2017 09:59

allanrenucci approved these changes Oct 24, 2017

View reviewed changes

nicolasstucki merged commit 2b58780 into scala:master Oct 24, 2017

allanrenucci deleted the inline-deep branch October 24, 2017 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inline deep #3316

Inline deep #3316

Uh oh!

nicolasstucki commented Oct 12, 2017

Uh oh!

nicolasstucki commented Oct 12, 2017

Uh oh!

nicolasstucki commented Oct 12, 2017

Uh oh!

smarter commented Oct 12, 2017

Uh oh!

smarter Oct 12, 2017

Uh oh!

nicolasstucki Oct 12, 2017

Uh oh!

nicolasstucki commented Oct 12, 2017

Uh oh!

retronym commented Oct 13, 2017 •

edited

Loading

Uh oh!

nicolasstucki commented Oct 13, 2017

Uh oh!

Uh oh!

Inline deep #3316

Inline deep #3316

Uh oh!

Conversation

nicolasstucki commented Oct 12, 2017

Uh oh!

nicolasstucki commented Oct 12, 2017

Uh oh!

nicolasstucki commented Oct 12, 2017

Uh oh!

smarter commented Oct 12, 2017

Uh oh!

smarter Oct 12, 2017

Choose a reason for hiding this comment

Uh oh!

nicolasstucki Oct 12, 2017

Choose a reason for hiding this comment

Uh oh!

nicolasstucki commented Oct 12, 2017

Uh oh!

retronym commented Oct 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolasstucki commented Oct 13, 2017

Uh oh!

Uh oh!

retronym commented Oct 13, 2017 •

edited

Loading